Don't let device failures or power outages ruin your training runs. In this tutorial, Yufeng Guo demonstrates how to use Keras with the Orbax checkpointing library. Learn how to implement a custom checkpoint manager and Keras callbacks to ensure your model state is always safely stored.
0:00 Introduction to Orbax & Keras Integration
0:39 Exploring Keras Checkpointing
1:11 Why Extend Keras for Multi-Host Environments?
1:48 What is Orbax?
2:29 Building Utility Classes: KerasOrbaxCheckpointManager & OrbaxCheckpointCallback
2:57 Deep Dive into KerasOrbaxCheckpointManager
3:45 Coding the Get, Save, and Restore State Functions
4:37 Implementing the OrbaxCheckpointCallback
5:12 Protecting Against Device Failures & Preemption
5:31 Implementation Details & Model.fit Integration
6:07 Checkpointing in Action: File Directory Walkthrough
6:56 Summary & Final Tips
Resources:
Orbax checkpointing in Keras - Developer guide →
ModelCheckpoint - Keras 3 API documentation →
Subscribe to Google for Developers →
Speaker: Yufeng Guo
Products Mentioned: Google AI
|
Don't let device failures or power outag...
Ross Richards, Senior Product Marketing ...
Now Playing has a new app that automatic...
Learn the basics of Data Structures in 6...
Today Quincy Larson interviews Justin Se...
本日はGoogleの神AIツール Antigravityについてお話させて頂きま...
For more details on this topic, visit th...
Download your free Python Cheat Sheet he...
Download your free Python Cheat Sheet he...
Tired of dependency updates breaking you...
Explore Lyria 3, Google DeepMind's new m...
XOCEAN is mapping the oceans to power th...
When Proximie’s CTO was operated on by h...
Every month, the world builds a new Manh...
For patients with rare and serious disea...